emotion distribution
HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
Zheng, Chuhang, Tian, Chunwei, Wen, Jie, Zhang, Daoqiang, Zhu, Qi
Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.
- North America > United States > District of Columbia > Washington (0.05)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- Asia > China > Jiangsu Province > Nanjing (0.05)
- (5 more...)
Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding
Belay, Tadesse Destaw, Azime, Israel Abebe, Ayele, Abinew Ali, Sidorov, Grigori, Klakow, Dietrich, Slusallek, Philipp, Kolesnikova, Olga, Yimam, Seid Muhie
Large Language Models (LLMs) show promising learning and reasoning abilities. Compared to other NLP tasks, multilingual and multi-label emotion evaluation tasks are under-explored in LLMs. In this paper, we present EthioEmo, a multi-label emotion classification dataset for four Ethiopian languages, namely, Amharic (amh), Afan Oromo (orm), Somali (som), and Tigrinya (tir). We perform extensive experiments with an additional English multi-label emotion dataset from SemEval 2018 Task 1. Our evaluation includes encoder-only, encoder-decoder, and decoder-only language models. We compare zero and few-shot approaches of LLMs to fine-tuning smaller language models. The results show that accurate multi-label emotion classification is still insufficient even for high-resource languages such as English, and there is a large gap between the performance of high-resource and low-resource languages. The results also show varying performance levels depending on the language and model type. EthioEmo is available publicly to further improve the understanding of emotions in language models and how people convey emotions through various languages.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Dominican Republic (0.14)
- (27 more...)
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description
Dehghani, Mahshid, Shafiee, Amirahmad, Shafiei, Ali, Fallah, Neda, Alizadeh, Farahmand, Gholinejad, Mohammad Mehdi, Behroozi, Hamid, Habibi, Jafar, Asgari, Ehsaneddin
Existing 3D facial emotion modeling have been constrained by limited emotion classes and insufficient datasets. This paper introduces "Emo3D", an extensive "Text-Image-Expression dataset" spanning a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, facilitating the capture of a broad spectrum of emotional expressions. Using this unique dataset, we conduct a comprehensive evaluation of language-based models' fine-tuning and vision-language models like Contranstive Language Image Pretraining (CLIP) for 3D facial expression synthesis. We also introduce a new evaluation metric for this task to more directly measure the conveyed emotion. Our new evaluation metric, Emo3D, demonstrates its superiority over Mean Squared Error (MSE) metrics in assessing visual-text alignment and semantic richness in 3D facial expressions associated with human emotions. "Emo3D" has great applications in animation design, virtual reality, and emotional human-computer interaction.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (3 more...)
Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction
Wu, Jingyao, Dang, Ting, Sethu, Vidhyasaharan, Ambikairajah, Eliathamby
There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity. However, there has been comparatively less effort devoted to the consideration of temporal dependencies in emotion distributions which encodes ambiguity in perceived emotions that evolve smoothly over time. Recognizing the benefits of using constrained dynamical neural ordinary differential equations (CD-NODE) to model time series as dynamic processes, we propose an ambiguity-aware dual-constrained Neural ODE approach to model the dynamics of emotion distributions on arousal and valence. In our approach, we utilize ODEs parameterised by neural networks to estimate the distribution parameters, and we integrate additional constraints to restrict the range of the system outputs to ensure the validity of predicted distributions. We evaluated our proposed system on the publicly available RECOLA dataset and observed very promising performance across a range of evaluation metrics.
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Wu, Wen, Li, Bo, Zhang, Chao, Chiu, Chung-Cheng, Li, Qiujia, Bai, Junwen, Sainath, Tara N., Woodland, Philip C.
The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional expressions during testing. This paper investigates three methods to handle ambiguous emotion. First, we show that incorporating utterances without majority-agreed labels as an additional class in the classifier reduces the classification performance of the other emotion classes. Then, we propose detecting utterances with ambiguous emotions as out-of-domain samples by quantifying the uncertainty in emotion classification using evidential deep learning. This approach retains the classification accuracy while effectively detects ambiguous emotion expressions. Furthermore, to obtain fine-grained distinctions among ambiguous emotions, we propose representing emotion as a distribution instead of a single class label. The task is thus re-framed from classification to distribution estimation where every individual annotation is taken into account, not just the majority opinion. The evidential uncertainty measure is extended to quantify the uncertainty in emotion distribution estimation. Experimental results on the IEMOCAP and CREMA-D datasets demonstrate the superior capability of the proposed method in terms of majority class prediction, emotion distribution estimation, and uncertainty estimation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- (8 more...)
Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors
Wu, Wen, Zhang, Chao, Wu, Xixin, Woodland, Philip C.
Emotion recognition is a key attribute for artificial intelligence systems that need to naturally interact with humans. However, the task definition is still an open problem due to the inherent ambiguity of emotions. In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes. An additional metric is used to evaluate the performance by detection test utterances with high labelling uncertainty. This removes a major limitation that emotion classification systems only consider utterances with labels where the majority of annotators agree on the emotion class. Furthermore, a frequentist approach is studied to leverage the continuous-valued "soft" labels obtained by averaging the one-hot labels. We propose a two-branch model structure for emotion classification on a per-utterance basis, which achieves state-of-the-art classification results on the widely used IEMOCAP dataset. Based on this, uncertainty estimation experiments were performed. The best performance in terms of the area under the precision-recall curve when detecting utterances with high uncertainty was achieved by interpolating the Bayesian training loss with the Kullback-Leibler divergence training loss for the soft labels. The generality of the proposed approach was verified using the MSP-Podcast dataset which yielded the same pattern of results.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Asia > China > Hong Kong (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (28 more...)
- Personal (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Distribution-based Emotion Recognition in Conversation
Wu, Wen, Zhang, Chao, Woodland, Philip C.
Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence. This paper proposes a distribution-based framework that formulates ERC as a sequence-to-sequence problem for emotion distribution estimation. The inherent ambiguity of emotions and the subjectivity of human perception lead to disagreements in emotion labels, which is handled naturally in our framework from the perspective of uncertainty estimation in emotion distributions. A Bayesian training loss is introduced to improve the uncertainty estimation by conditioning each emotional state on an utterance-specific Dirichlet prior distribution. Experimental results on the IEMOCAP dataset show that ERC outperformed the single-utterance-based system, and the proposed distribution-based ERC methods have not only better classification accuracy, but also show improved uncertainty estimation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- (10 more...)
Seeking Subjectivity in Visual Emotion Distribution Learning
Yang, Jingyuan, Li, Jie, Li, Leida, Wang, Xiumei, Ding, Yuxuan, Gao, Xinbo
Visual Emotion Analysis (VEA), which aims to predict people's emotions towards different visual stimuli, has become an attractive research topic recently. Rather than a single label classification task, it is more rational to regard VEA as a Label Distribution Learning (LDL) problem by voting from different individuals. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. In psychology, the \textit{Object-Appraisal-Emotion} model has demonstrated that each individual's emotion is affected by his/her subjective appraisal, which is further formed by the affective memory. Inspired by this, we propose a novel \textit{Subjectivity Appraise-and-Match Network (SAMNet)} to investigate the subjectivity in visual emotion distribution. To depict the diversity in crowd voting process, we first propose the \textit{Subjectivity Appraising} with multiple branches, where each branch simulates the emotion evocation process of a specific individual. Specifically, we construct the affective memory with an attention-based mechanism to preserve each individual's unique emotional experience. A subjectivity loss is further proposed to guarantee the divergence between different individuals. Moreover, we propose the \textit{Subjectivity Matching} with a matching loss, aiming at assigning unordered emotion labels to ordered individual predictions in a one-to-one correspondence with the Hungarian algorithm. Extensive experiments and comparisons are conducted on public visual emotion distribution datasets, and the results demonstrate that the proposed SAMNet consistently outperforms the state-of-the-art methods. Ablation study verifies the effectiveness of our method and visualization proves its interpretability.
- Asia > China > Shaanxi Province > Xi'an (0.05)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- Asia > China > Chongqing Province > Chongqing (0.04)
- (6 more...)
Emotional Semantics-Preserved and Feature-Aligned CycleGAN for Visual Emotion Adaptation
Zhao, Sicheng, Chen, Xuanbai, Yue, Xiangyu, Lin, Chuang, Xu, Pengfei, Krishna, Ravi, Yang, Jufeng, Ding, Guiguang, Sangiovanni-Vincentelli, Alberto L., Keutzer, Kurt
Thanks to large-scale labeled training data, deep neural networks (DNNs) have obtained remarkable success in many vision and multimedia tasks. However, because of the presence of domain shift, the learned knowledge of the well-trained DNNs cannot be well generalized to new domains or datasets that have few labels. Unsupervised domain adaptation (UDA) studies the problem of transferring models trained on one labeled source domain to another unlabeled target domain. In this paper, we focus on UDA in visual emotion analysis for both emotion distribution learning and dominant emotion classification. Specifically, we design a novel end-to-end cycle-consistent adversarial model, termed CycleEmotionGAN++. First, we generate an adapted domain to align the source and target domains on the pixel-level by improving CycleGAN with a multi-scale structured cycle-consistency loss. During the image translation, we propose a dynamic emotional semantic consistency loss to preserve the emotion labels of the source images. Second, we train a transferable task classifier on the adapted domain with feature-level alignment between the adapted and target domains. We conduct extensive UDA experiments on the Flickr-LDL & Twitter-LDL datasets for distribution learning and ArtPhoto & FI datasets for emotion classification. The results demonstrate the significant improvements yielded by the proposed CycleEmotionGAN++ as compared to state-of-the-art UDA approaches.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)
- Education (0.48)
- Information Technology > Services (0.35)